Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve search indexing reliability #1065

Merged
merged 9 commits into from
Feb 16, 2024
Merged

Conversation

MonkeyDo
Copy link
Member

@MonkeyDo MonkeyDo commented Feb 16, 2024

This PR came about after #740 started causing the website to crash when editing/creating Work entities.

I took this opportunity to make the search indexing more reliable by improving error catching.

While rewriting some of that code, I realized we are indexing a loooot of fields that we have no reason to index.
First of all, some internal ORM fields that are output by default when serializing to JSON.
Then the entire entity info is stored inn ElasticSearch, while on the other hand we load entities afresh from the DB based on the search results hits, which means at the moment we do absolutely nothing with all this entity info other than take disk space, slow down indexing and transferring too much data.

So we are now "cleaning" the documents before indexing them, i.e. keeping only the fields we need.
In the future, we will want to store more information and return it directly instead of fetching from the DB for presentation, but that will require more rewriting.
If we are planning on moving to SOLR soon enough that seems like a possible waste of time. A good candidate for a part 2 in any case.

Also removed some manipulation we did to shoehorn collection, editor and area ids into a "bbid" field.
Now supports "id" field as well.

Finally, since we are now passing ORM models over for indexing, some rewriting was necessary wherever we call the search indexing, and I took that opportunity to rewrite some messy chained promises to async/await syntax.

MonkeyDo and others added 9 commits February 7, 2024 17:51
We have been storing a ton of information in the search index that we just won't ever use such as set ids and internal props from the ORM Model (_pivot…).
Instead, let's pass the ORM models along and create a new utility to strip the dcument to index down to what we actually need.
With the change from the previous commit (accepting an ORM model rather than JSON for search indexing), we need to rewrite accordingly the parts of the code that use the search indexing.

Taking this opportunity to rewrite some code from promises to async/await syntax.
Missed some places where we need to set attributes on the ORM models for editor, collections and other non-entity types.
Cheeky async/await rewrite to clarify some of it.
Not all entities have a BBID field now, some have "id" instead
@MonkeyDo MonkeyDo merged commit 2622d7c into master Feb 16, 2024
4 checks passed
@MonkeyDo MonkeyDo deleted the search-indexing-issues branch February 16, 2024 13:04
MonkeyDo added a commit that referenced this pull request Jun 4, 2024
Instead of JSON representation.
See #1065
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant